1,649 research outputs found

    Location Anonymization With Considering Errors and Existence Probability

    Get PDF
    Mobile devices that can sense their location using GPS or Wi-Fi have become extremely popular. However, many users hesitate to provide their accurate location information to unreliable third parties if it means that their identities or sensitive attribute values will be disclosed by doing so. Many approaches for anonymization, such as k-anonymity, have been proposed to tackle this issue. Existing studies for k-anonymity usually anonymize each user\u27s location so that the anonymized area contains k or more users. Existing studies, however, do not consider location errors and the probability that each user actually exists at the anonymized area. As a result, a specific user might be identified by untrusted third parties. We propose novel privacy and utility metrics that can treat the location and an efficient algorithm to anonymize the information associated with users\u27 locations. This is the first work that anonymizes location while considering location errors and the probability that each user is actually present at the anonymized area. By means of simulations, we have proven that our proposed method can reduce the risk of the user\u27s attributes being identified while maintaining the utility of the anonymized data

    Differential Private Data Collection and Analysis Based on Randomized Multiple Dummies for Untrusted Mobile Crowdsensing

    Get PDF
    Mobile crowdsensing, which collects environmental information from mobile phone users, is growing in popularity. These data can be used by companies for marketing surveys or decision making. However, collecting sensing data from other users may violate their privacy. Moreover, the data aggregator and/or the participants of crowdsensing may be untrusted entities. Recent studies have proposed randomized response schemes for anonymized data collection. This kind of data collection can analyze the sensing data of users statistically without precise information about other users\u27 sensing results. However, traditional randomized response schemes and their extensions require a large number of samples to achieve proper estimation. In this paper, we propose a new anonymized data-collection scheme that can estimate data distributions more accurately. Using simulations with synthetic and real datasets, we prove that our proposed method can reduce the mean squared error and the JS divergence by more than 85% as compared with other existing studies

    Anonymization of Sensitive Quasi-Identifiers for l-diversity and t-closeness

    Get PDF
    A number of studies on privacy-preserving data mining have been proposed. Most of them assume that they can separate quasi-identifiers (QIDs) from sensitive attributes. For instance, they assume that address, job, and age are QIDs but are not sensitive attributes and that a disease name is a sensitive attribute but is not a QID. However, all of these attributes can have features that are both sensitive attributes and QIDs in practice. In this paper, we refer to these attributes as sensitive QIDs and we propose novel privacy models, namely, (l1, ..., lq)-diversity and (t1, ..., tq)-closeness, and a method that can treat sensitive QIDs. Our method is composed of two algorithms: an anonymization algorithm and a reconstruction algorithm. The anonymization algorithm, which is conducted by data holders, is simple but effective, whereas the reconstruction algorithm, which is conducted by data analyzers, can be conducted according to each data analyzer’s objective. Our proposed method was experimentally evaluated using real data sets

    An algorithm to reduce the communication traffic for multi-word searches in a distributed hash table

    Get PDF
    In distributed hash tables, much communication traffic comes from multi-word searches. The aim of this work is to reduce the amount of traffic by using a bloom filter, which is a space-efficient probabilistic data structure used to test whether or not an element is a member of a set. However, bloom filters have a limited role if several sets have different numbers of elements. In the proposed method, extra data storage is generated when contents' keys are registered in a distributed hash table system. Accordingly, we propose a "divided bloom filter" to solve the problem of a normal bloom filter. Using the divided bloom filter, we aim to reduce both the amount of communication traffic and the amount of data storage.4th IFIP International Conference on Theoretical Computer ScienceRed de Universidades con Carreras en Informática (RedUNCI

    An algorithm to reduce the communication traffic for multi-word searches in a distributed hash table

    Get PDF
    In distributed hash tables, much communication traffic comes from multi-word searches. The aim of this work is to reduce the amount of traffic by using a bloom filter, which is a space-efficient probabilistic data structure used to test whether or not an element is a member of a set. However, bloom filters have a limited role if several sets have different numbers of elements. In the proposed method, extra data storage is generated when contents' keys are registered in a distributed hash table system. Accordingly, we propose a "divided bloom filter" to solve the problem of a normal bloom filter. Using the divided bloom filter, we aim to reduce both the amount of communication traffic and the amount of data storage.4th IFIP International Conference on Theoretical Computer ScienceRed de Universidades con Carreras en Informática (RedUNCI

    Privacy-preserving chi-squared test of independence for small samples

    Get PDF
    Background:The importance of privacy protection in analyses of personal data, such as genome-wide association studies (GWAS), has grown in recent years. GWAS focuses on identifying single-nucleotide polymorphisms (SNPs) associated with certain diseases such as cancer and diabetes, and the chi-squared (χ2) hypothesis test of independence can be utilized for this identification. However, recent studies have shown that publishing the results of χ2 tests of SNPs or personal data could lead to privacy violations. Several studies have proposed anonymization methods for χ2 testing with ε-differential privacy, which is the cryptographic community’s de facto privacy metric. However, existing methods can only be applied to 2×2 or 2×3 contingency tables, otherwise their accuracy is low for small numbers of samples. It is difficult to collect numerous high-sensitive samples in many cases such as COVID-19 analysis in its early propagation stage.Results:We propose a novel anonymization method (RandChiDist), which anonymizes χ2 testing for small samples. We prove that RandChiDist satisfies differential privacy. We also experimentally evaluate its analysis using synthetic datasets and real two genomic datasets. RandChiDist achieved the least number of Type II errors among existing and baseline methods that can control the ratio of Type I errors.Conclusions:We propose a new differentially private method, named RandChiDist, for anonymizing χ2 values for an I×J contingency table with a small number of samples. The experimental results show that RandChiDist outperforms existing methods for small numbers of samples

    Differentially Private Mobile Crowd Sensing Considering Sensing Errors

    Get PDF
    An increasingly popular class of software known as participatory sensing, or mobile crowdsensing, is a means of collecting people’s surrounding information via mobile sensing devices. To avoid potential undesired side effects of this data analysis method, such as privacy violations, considerable research has been conducted over the last decade to develop participatory sensing that looks to preserve privacy while analyzing participants’ surrounding information. To protect privacy, each participant perturbs the sensed data in his or her device, then the perturbed data is reported to the data collector. The data collector estimates the true data distribution from the reported data. As long as the data contains no sensing errors, current methods can accurately evaluate the data distribution. However, there has so far been little analysis of data that contains sensing errors. A more precise analysis that maintains privacy levels can only be achieved when a variety of sensing errors are considered

    誤差を考慮した位置匿名化手法の提案

    Get PDF
    年齢,年収,趣味等のユーザ属性と,ユーザの行動履歴とを関連付けてマイニングすることで,ユーザ属性や位置情報に応じた適切なマーケティングや広告配信をすることが可能となる.しかし,あるユーザの行動履歴の一部を知る攻撃者にこの情報がわたると,関連付けられたユーザ属性と個人を結び付けられるリスクがある.従来研究において,ユーザの行動履歴を知る攻撃者に対してもユーザ属性と個人を結び付けられることを防ぐため,k-匿名性等の指標に基づく匿名化手法が多数提案されている.しかし,ユーザの位置情報には誤差が含まれていることが考慮されておらず,誤差がある環境下では個人が特定されるリスクが増加する.また,匿名化後のデータの有効性指標にも誤差が考慮されていない.本論文では,位置情報には誤差があるという現実的な環境を想定し,新しいプライバシー指標,匿名化後のデータにおける有効性指標,及びこれら指標に基づいた匿名化アルゴリズムを提案する.シミュレーション評価を実施し,従来手法と比べて匿名化後のデータの有効性を向上させ,同時に,個人が特定されるリスクを低減することを示す.Data mining can support effective marketing or advertisement based on users\u27 attributes such as sex and ages and their locations. However, attackers can identify specific user\u27s attributes if they know the user\u27s location. A lot of approaches for anonymization such as k-anonymity have been proposed to tackle this problem. Existing studies, however, do not take errors of the location information into consideration. Therefore, there is a risk that a specific user\u27s attribute can be identified by an attacker. Moreover, the utility measure proposed in existing studies does not consider errors of the location information. We propose a novel privacy measure and a utility measure that can treat the errors of the location information and propose a method to anonymize the information of users\u27 locations based on the proposed measures. By simulations, we prove our proposed method can improve the utility of the anonymized information and reduce the risk of the user\u27s attribute being identified

    An Algorithm to Reduce the Communication Traffic for Multi-Word Searches in a Distributed Hash Table

    Get PDF
    Abstract. In distributed hash tables, much communication traffic comes from multi-word searches. The aim of this work is to reduce the amount of traffic by using a bloom filter, which is a space-efficient probabilistic data structure used to test whether or not an element is a member of a set. However, bloom filters have a limited role if several sets have different numbers of elements. In the proposed method, extra data storage is generated when contents' keys are registered in a distributed hash table system. Accordingly, we propose a "divided bloom filter" to solve the problem of a normal bloom filter. Using the divided bloom filter, we aim to reduce both the amount of communication traffic and the amount of data storage

    ユビキタスコンピューティングにおけるl-エントロピーを満たす匿名データ収集

    Get PDF
    ユビキタスコンピューティング環境において多くのユーザからセンシングしたデータを収集し,その分布を把握することによって,国の政策や企業における意思決定に役立てることができる.しかし,これらのデータには個人を特定できる情報が含まれることがあり,ユーザのプライバシー情報が漏洩するリスクがある.このような問題に対応し,全てのユーザが必ず正しくない情報を提供することで,プライバシーを保護しつつ,サーバ側で真のデータ分布を推測するNegative Surveyという手法が提案されている.従来のNegative Surveyでは多数のユーザ情報を収集しなければ分布を高精度に推測できないという欠点があった.近年,少ないユーザ数から真の分布を推測することができる手法が複数提案されているが,いずれもプライバシー保護レベルが低いという課題がある.本研究では,プライバシー保護レベルを一定レベルに保ち,従来手法よりも真の分布に近い情報を得られる手法を提案する.近年提案されている手法と比較して平均2乗誤差を約1/2から1/30程度にまで削減できることを数学的解析及びシミュレーションによって示す.Ubiquitous computing can collect sensing data of users. These data can be used for national policy or decision-making of companies. However, sensing users may violate their privacy. Negative surveys collect incorrect data of each user and can assume true data distributions of users. Traditional negative survey needs a lot of samples for precise estimation. These days several types of negative surveys, which can estimate data distributions with a high degree of accuracy, are proposed. However, a privacy level of these methods is relatively low. In this paper, we propose a new negative survey which can estimate data distributions with more precision and protect privacy more strictly. By mathematical analysis and simulations, we prove tha our proposed method can reduce MSE by between approximately 1/2 and 1/30
    corecore